Import¶
import warnings
from sklearn.exceptions import UndefinedMetricWarning
warnings.filterwarnings("ignore", category=UndefinedMetricWarning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
import plotly
plotly.offline.init_notebook_mode()
from file_py.run_log_parser import RunLogParser
from file_py.csv_preprocessing_scaler import CsvPreprocessingScaler
from file_py.plots import Plots
from file_py.utils import MarkdownHelper
from file_py.attack_log_unification import AttackLogUnification
from file_py.stat_severity import StatSeverity
from file_py.attack_pattern_analyzer import AttackPatternAnalyzer
from file_py.signatures_patterns import SignaturePatterns
from file_py.signature_stats_calculator import SignatureStatsCalculator
from file_py.sigma_rule_analysis import SigmaRuleAnalysis
from file_py.plots_single_attack import PlotsSingleAttack
from file_py.correlation_matrix_plots import CorrelationMatrixPlots
from file_py.preprocessing_train_test_split import PreprocessingTrainTestSplit
from file_py.initial_training import InitialTraining
from file_py.hyperparameter_tuning import HyperparameterTuning
from file_py.advanced_models import AdvancedModels
from file_py.deep_learning_model import DeepLearningModel
from file_py.model_evaluator import ModelEvaluator
CARICAMENTO FILE¶
Sostituire il percorso dei file attuali con il percorso dei file di interesse qui:
# FILE CONTENENTE I LOG
df = CsvPreprocessingScaler.read_csv_file("file_csv/LogSplunkWF_03_07.csv")
# FILE CON LE DATE DI INIZIO E FINE DEGLI ATTACCHI
files = ['file_csv/attackLog_03_07.csv']
Preprocessing¶
df_raw = CsvPreprocessingScaler.RawPreprocessing(df)
df_Le = CsvPreprocessingScaler.LEPreprocessing(df)
df_OH = CsvPreprocessingScaler.OhePreprocessing(df)
df_std_LE = CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.LEPreprocessing(df))
df_std_OH = CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.OhePreprocessing(df))
Test¶
attack_log_path = AttackLogUnification.attack_log_together(files,'file_csv/attackLog_03_07.csv')
result_df_Le = RunLogParser.process_attacks(attack_log_path, CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.LEPreprocessing(df)))
result_df_OH = RunLogParser.process_attacks(attack_log_path, CsvPreprocessingScaler.stdScaler(CsvPreprocessingScaler.OhePreprocessing(df)))
result_df_Raw = RunLogParser.process_attacks(attack_log_path, CsvPreprocessingScaler.RawPreprocessing(df))
Graphic Analysis of Attacks¶
Plots.plot_cake_attack(result_df_Raw)
Plots.plot_top_10_signatures(result_df_Raw)
Qui si può notare come generalmente le regole scattate più volte sono anche quelle che hanno effettivamente risposto a più attachi e che sono scattate a vuoto più volte.
Plots.plot_precision_recall(result_df_Raw)
Il primo grafico mostra la precisione di ciascuna regola, cioè la proporzione di attivazioni corrette rispetto al totale delle sue attivazioni.
Una precisione più alta indica che la regola è più accurata nel rilevare veri attacchi.
Il secondo grafico mostra il recall, cioè la proporzione di attacchi reali rilevati dalla regola rispetto al totale degli attacchi reali.
Un recall più alto indica che la regola è più efficace nel rilevare tutti gli attacchi possibili.
Plots.plot_distributions(result_df_Raw)
Plots.plot_value_counts_per_unique(result_df_Raw)
variables = MarkdownHelper.create_value_counts_variables(result_df_Raw)
MarkdownHelper.display_value_counts_text(variables)
Grazie a questo grafico invece possiamo giungere ad una serie di conclusioni.
Su 31 regole diverse:
quelle scattate in risposta ad ALMENO un attacco reale sono 22. Di queste:
- 4 si sono attivate più volte per non-attacchi rispetto che per gli attacchi. (regole generiche)
- 6 si sono attivate lo stesso numero di volte per attacchi e non-attacchi.
- 12 si sono attivate più volte in risposta ad attacchi rispetto che a non-attacchi (regole specifiche).
quelle scattate senza rispondere mai ad attacchi sono 9.
Si tratta di: ['load-of-dbghelp/dbgcore-dll-from-suspicious-process', 'proc-start-suspicious-wmiprvse-child-process', 'proc-start-wmiservice-child', 'proc-start-cobaltstrike-load-by-rundll32', 'proc-start-powershell-base64-encoded-invoke-keyword', 'proc-start-hacktool-mimikatz-execution', 'proc-start-suspicious-powershell-parameter-substring', 'proc-start-lolbas-compile', 'proc-start-suspicious-process-created-via-wmic.exe']
Analysis of Severity per Attacks¶
event_df = RunLogParser.create_event_df(attack_log_path, result_df_Raw)
Creazione del df event_df con le nuove colonne severity_max, _min, _mean
Grafici¶
StatSeverity.plot_stat_severity(event_df)
In questo grafico vediamo, per ciascun attacco presente nel dataset, quali sono le loro criticità massime, minime e medie.
analyzer = AttackPatternAnalyzer(event_df)
# SCEGLIERE UN VALORE PER LA SEVERITY DELLE REGOLE DA CONSIDERARE
severity_value=73
# SCEGLIERE IL NUMERO DI ATTACCHI DA CONSIDERARE PRIMA DELLE REGOLE AVENTI LA SEVERITY SCELTA
num_attacks=10
analyzer.pattern_before_attack(num_attacks=num_attacks, severity_value=severity_value)
In questi grafici prendiamo in considerazione gli attacchi precedenti a tutti gli attacchi che hanno una certa criticità media e visualizziamo tutti i valori di "RuleAnnotation.mitre_attack.id", "signature", "EventType", "tag", "severity_id" corrispondenti.
Per scegliere quanti attacchi prima di quelli che ci interessano vogliamo considerare basta modificare la variabile "num_attacks" e assegnarle il numero che vogliamo,
mentre per scegliere il valore della criticità media che ci interessa si deve modificare la variabile "severity_value".
Verranno presi in considerazione tutti gli attacchi con criticità media compresa tra 2.5 prima e 2.5 dopo del valore assegnato a "severity_value".
Robustezza regole¶
signature_stats = SignatureStatsCalculator.create_signature_stats(event_df, result_df_Raw)
signature_stats
| signature | Indice_Diff | Media_Differenza_Severity_min | Media_Differenza_Severity_mean | Media_Differenza_Severity_max | N_Max_Sev_Diff_15 | N_Attacchi_Non_rilevati | |
|---|---|---|---|---|---|---|---|
| 0 | net-connect-80-443-non-browser | 0.026178 | -4.807692 | -0.692819 | 0.000000 | 0 | 0 |
| 1 | net-connect-Windows-processes | 0.002101 | 0.000000 | -0.121006 | 0.000000 | 0 | 0 |
| 2 | net-connect-suspicious-target-names | 0.001212 | 0.000000 | 0.148516 | 0.000000 | 0 | 0 |
| 3 | proc-start-suspicious-powershell-download-and-... | 0.005523 | 0.000000 | 0.148516 | 0.000000 | 0 | 0 |
| 4 | proc-start-powershell-download-and-execution-c... | 0.007495 | 0.000000 | 0.175519 | 0.000000 | 0 | 4 |
| 5 | proc-start-dumping-of-sensitive-hives-via-reg.exe | 0.002637 | 0.000000 | 0.545410 | 0.000000 | 0 | 0 |
| 6 | proc-start-dir-user-writeable | 0.023308 | -3.125000 | -0.893017 | 0.000000 | 0 | 2 |
| 7 | suspicious-unsigned-dbghelp/dbgcore-dll-loaded | 0.045368 | -1.041667 | 3.040575 | 3.125000 | 2 | 2 |
| 8 | reg-key-create-service | 0.002792 | 0.000000 | -0.025202 | 0.000000 | 0 | 0 |
| 9 | proc-start-malicious-powershell-commandlets-pr... | 0.013459 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 10 | proc-start-potential-meterpreter/cobaltstrike-... | 0.000770 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 11 | proc-start-abused-debug-privilege-by-arbitrary... | 0.000562 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 12 | proc-start-potential-cobaltstrike-process-patt... | 0.000370 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 13 | proc-start-suspicious-new-service-creation | 0.000562 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 14 | proc-start-lolbas-compile | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 15 | proc-start-lolbas-alternate-data-streams | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 2 |
| 16 | proc-start-potential-winapi-calls-via-commandline | 0.015191 | 0.000000 | 0.694444 | 1.041667 | 1 | 2 |
| 17 | reg-value-write-cert-change | 0.022747 | 0.000000 | 2.264957 | 2.884615 | 2 | 0 |
| 18 | proc-start-hacktool-mimikatz-execution | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 19 | proc-start-cobaltstrike-load-by-rundll32 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 20 | proc-start-suspicious-wmiprvse-child-process | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 21 | proc-start-wmiservice-child | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 22 | proc-start-suspicious-process-created-via-wmic... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 23 | proc-start-powershell-base64-encoded-invoke-ke... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 24 | proc-start-suspicious-powershell-parameter-sub... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 25 | proc-start-rundll32-execution-without-dll-file | 0.001923 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 26 | proc-start-suspicious-key-manager-access | 0.001923 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 27 | proc-start-potentially-suspicious-powershell-c... | 0.001923 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 28 | load-of-dbghelp/dbgcore-dll-from-suspicious-pr... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 29 | proc-start-process-memory-dump-via-comsvcs.dll | 0.004006 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
| 30 | proc-start-potential-credential-dumping-attemp... | 0.002564 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 |
signature_stats è un dataset in cui possiamo vedere per ogni regola se venisse rimossa quali cambiamenti di severity apporterebbe al dataset dei log e se ci dovessero essere degli attacchi che non vengono rilevati.
analysis = SigmaRuleAnalysis(signature_stats)
analysis.plots_sigma_rule_analysis()
Graphic Analysis of Attacks for Chosen Rule¶
# SCEGLIERE LA REGOLA CHE SI VUOLE ANALIZZARE
regola_scelta = 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded'
PlotsSingleAttack.analyze_rule_activations(result_df_Raw, regola_scelta)
In questi grafici in base alla regola che si vuole analizzare possiamo visualizzare:
- la frequenza delle attivazioni delle regole (attacchi e non attacchi) suddivise in intervalli di 5 minuti;
- gli attacchi e i non-attacchi in base a:
- RuleAnnotation.mitre_attack.id
- EventType,
- severity,
- tag
- parent_process_id,
- process_id
# SCEGLIERE IL NUMERO DI EVENTI PRECEDENTI ALLA REGOLA CHE SI VOGLIONO ANALIZZARE
eventi_da_considerare = 5
PlotsSingleAttack.patterns_before_activation(result_df_Raw, regola_scelta, eventi_da_considerare)
In questi grafici vediamo quali sono rispettivamente le regole, gli attacchi, gli EventType, i tag, i parent_process, i process e le severity degli eventi subito prima delle prime attivazioni della regola scelta.
Il numero di eventi da considerare lo scegliamo affidando alla variabile elementi_da_considerare il numero che vogliamo.
Con "prima attivazione di una regola" si intende quando almeno un elemento delle colonne signature, RuleAnnotation.mitre_attack.id, EventType, tag, severity_id, parent_process_id o process_id (non sono considerate solo le colonne _time e corrisponde_ad_attacco) di un evento differisce da quello precedente.
Patterns¶
signature_patterns = SignaturePatterns.recognize_signatures_patterns(result_df_Raw)
signature_patterns
Pattern: ('proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 252
Pattern: ('proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 248
Pattern: ('proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 244
Pattern: ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'), Frequenza: 26
Pattern: ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'), Frequenza: 25
Pattern: ('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'), Frequenza: 24
Pattern: ('reg-value-write-cert-change', 'reg-value-write-cert-change'), Frequenza: 9
Pattern: ('reg-value-write-cert-change', 'reg-value-write-cert-change', 'reg-value-write-cert-change'), Frequenza: 7
Pattern: ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles'), Frequenza: 6
Pattern: ('reg-value-write-cert-change', 'reg-value-write-cert-change', 'reg-value-write-cert-change', 'reg-value-write-cert-change'), Frequenza: 5
Pattern: ('proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation', 'proc-start-malicious-powershell-commandlets-processcreation'), Frequenza: 4
Pattern: ('proc-start-dir-user-writeable', 'proc-start-dir-user-writeable'), Frequenza: 3
Pattern: ('proc-start-potential-meterpreter/cobaltstrike-activity', 'proc-start-potential-meterpreter/cobaltstrike-activity'), Frequenza: 3
In signature_patterns vediamo le sequenze di 3, 4 o 5 regole in ordine da quella più a quella meno frequente ripetutesi più volte durante i vari attacchi e che non compaiono mai tra le sequenze di falsi attacchi
With specified severity value¶
result_pattern_inside_attack = analyzer.pattern_inside_attack(severity_value=severity_value)
result_pattern_inside_attack
MITRE ATT&CK IDs:
1-digit repetitions:
2-digits sequences:
('T1036', 'T1036'): 1
('T1003.001', 'T1003.001'): 1
3-digits sequences:
('T1059.001', 'T1059', 'T1482'): 2
('T1003.001', 'T1003.001', 'T1003.001'): 2
('T1218.011', 'T1555.004', 'T1059.001'): 1
('T1003.001', 'T1003.001', 'T1106'): 1
('T1134.001', 'T1134.001', 'T1134.002'): 1
('T1003.002', 'T1003.002', 'T1003.002'): 1
SIGNATURES:
1-digit repetitions:
2-digits sequences:
('proc-start-process-memory-dump-via-comsvcs.dll', 'proc-start-process-memory-dump-via-comsvcs.dll'): 1
('suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded'): 1
3-digits sequences:
('proc-start-suspicious-powershell-download-and-execute-pattern', 'proc-start-powershell-download-and-execution-cradles', 'proc-start-malicious-powershell-commandlets-processcreation'): 2
('suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded'): 2
('proc-start-rundll32-execution-without-dll-file', 'proc-start-suspicious-key-manager-access', 'proc-start-potentially-suspicious-powershell-child-processes'): 1
('suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'suspicious-unsigned-dbghelp/dbgcore-dll-loaded', 'proc-start-potential-winapi-calls-via-commandline'): 1
('proc-start-potential-meterpreter/cobaltstrike-activity', 'proc-start-potential-meterpreter/cobaltstrike-activity', 'proc-start-potential-meterpreter/cobaltstrike-activity'): 1
('proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe', 'proc-start-dumping-of-sensitive-hives-via-reg.exe'): 1
In result_pattern_inside_attack vediamo:
- '1-digit repetitions' che corrisponde alle ripetizioni di mitre_attack.id e signature in testa agli attacchi con un numero massimo di 3 mitre o signature registrati;
- '2-digits sequences' che corrisponde alle sequenze di mitre_attack.id e signature in testa agli attacchi con un numero di mitre o signature compreso tra 4 e 5;
- '3-digits sequences' che corrisponde alle sequenze di mitre_attack.id e signature in testa agli attacchi con un numero di mitre o signature maggiore di 5 (non compreso).
Correlation Matrix¶
CorrelationMatrixPlots.plot_correlation_matrix(result_df_Le, 'Correlation Matrix (Label Encoding)')
CorrelationMatrixPlots.plot_correlation_matrix_big(result_df_OH, 'Correlation Matrix (OneHot Encoding)')
ML¶
OneHot¶
# Split data
X_train_OH, X_test_OH, y_train_OH, y_test_OH = PreprocessingTrainTestSplit.split_data(result_df_OH, "corrisponde_ad_attacco")
# Initial model training and evaluation
InitialTraining.train_and_evaluate_initial_models(X_train_OH, y_train_OH, X_test_OH, y_test_OH)
# Hyperparameter tuning
best_models_OH = HyperparameterTuning.tune_hyperparameters(X_train_OH, y_train_OH)
# Evaluate best models on test set
evaluator_OH = ModelEvaluator(best_models_OH)
evaluation_results_OH = evaluator_OH.evaluate_models(X_test_OH, y_test_OH)
# Train XGBoost model
AdvancedModels.train_xgboost(X_train_OH, y_train_OH, X_test_OH, y_test_OH)
# Train deep learning model
DeepLearningModel.train_deep_learning_model(X_train_OH, y_train_OH, X_test_OH, y_test_OH)
Decision Tree Classification Report:
precision recall f1-score support
0 0.84 0.67 0.74 39
1 0.89 0.95 0.92 111
accuracy 0.88 150
macro avg 0.86 0.81 0.83 150
weighted avg 0.88 0.88 0.88 150
AdaBoost Classification Report:
precision recall f1-score support
0 0.79 0.59 0.68 39
1 0.87 0.95 0.91 111
accuracy 0.85 150
macro avg 0.83 0.77 0.79 150
weighted avg 0.85 0.85 0.85 150
XGBoost Classification Report:
precision recall f1-score support
0 0.85 0.72 0.78 39
1 0.91 0.95 0.93 111
accuracy 0.89 150
macro avg 0.88 0.84 0.85 150
weighted avg 0.89 0.89 0.89 150
CatBoost Classification Report:
precision recall f1-score support
0 0.87 0.67 0.75 39
1 0.89 0.96 0.93 111
accuracy 0.89 150
macro avg 0.88 0.82 0.84 150
weighted avg 0.89 0.89 0.88 150
MLP Classification Report:
precision recall f1-score support
0 0.26 1.00 0.41 39
1 0.00 0.00 0.00 111
accuracy 0.26 150
macro avg 0.13 0.50 0.21 150
weighted avg 0.07 0.26 0.11 150
Quadratic Discriminant Analysis Classification Report:
precision recall f1-score support
0 0.41 0.92 0.57 39
1 0.95 0.53 0.68 111
accuracy 0.63 150
macro avg 0.68 0.73 0.62 150
weighted avg 0.81 0.63 0.65 150
Extra Trees Classification Report:
precision recall f1-score support
0 0.84 0.67 0.74 39
1 0.89 0.95 0.92 111
accuracy 0.88 150
macro avg 0.86 0.81 0.83 150
weighted avg 0.88 0.88 0.88 150
Best parameters for Random Forest: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}
Best F1-score: 0.9421461328806625
Best parameters for Gradient Boosting: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
Best F1-score: 0.9409593851756958
Best parameters for Naive Bayes: {}
Best F1-score: 0.2488116446766236
Best parameters for KNN: {'knn__metric': 'manhattan', 'knn__n_neighbors': 7, 'knn__weights': 'distance'}
Best F1-score: 0.9712771479423694
Best parameters for Logistic Regression: {'logreg__C': 1, 'logreg__solver': 'lbfgs'}
Best F1-score: 0.9084881053802359
Random Forest Classification Report:
precision recall f1-score support
0 0.84 0.67 0.74 39
1 0.89 0.95 0.92 111
accuracy 0.88 150
macro avg 0.86 0.81 0.83 150
weighted avg 0.88 0.88 0.88 150
Gradient Boosting Classification Report:
precision recall f1-score support
0 0.84 0.67 0.74 39
1 0.89 0.95 0.92 111
accuracy 0.88 150
macro avg 0.86 0.81 0.83 150
weighted avg 0.88 0.88 0.88 150
Naive Bayes Classification Report:
precision recall f1-score support
0 0.30 0.92 0.45 39
1 0.90 0.24 0.38 111
accuracy 0.42 150
macro avg 0.60 0.58 0.42 150
weighted avg 0.74 0.42 0.40 150
KNN Classification Report:
precision recall f1-score support
0 0.87 0.87 0.87 39
1 0.95 0.95 0.95 111
accuracy 0.93 150
macro avg 0.91 0.91 0.91 150
weighted avg 0.93 0.93 0.93 150
Logistic Regression Classification Report:
precision recall f1-score support
0 0.77 0.69 0.73 39
1 0.90 0.93 0.91 111
accuracy 0.87 150
macro avg 0.83 0.81 0.82 150
weighted avg 0.86 0.87 0.86 150
[0] train-auc:0.82316 eval-auc:0.82121
[1] train-auc:0.82316 eval-auc:0.82121
[2] train-auc:0.82466 eval-auc:0.82144
[3] train-auc:0.83513 eval-auc:0.82040
[4] train-auc:0.83513 eval-auc:0.82040
[5] train-auc:0.84525 eval-auc:0.86232
[6] train-auc:0.84442 eval-auc:0.85285
[7] train-auc:0.84344 eval-auc:0.85054
[8] train-auc:0.84471 eval-auc:0.85331
[9] train-auc:0.84471 eval-auc:0.85331
[10] train-auc:0.84934 eval-auc:0.86498
[11] train-auc:0.84951 eval-auc:0.85505
[12] train-auc:0.84995 eval-auc:0.85643
[13] train-auc:0.84999 eval-auc:0.85643
[14] train-auc:0.84999 eval-auc:0.85643
[15] train-auc:0.85039 eval-auc:0.87052
[16] train-auc:0.85077 eval-auc:0.87168
[17] train-auc:0.85314 eval-auc:0.87156
[18] train-auc:0.85306 eval-auc:0.87156
[19] train-auc:0.85306 eval-auc:0.87156
[20] train-auc:0.85306 eval-auc:0.87156
[21] train-auc:0.85162 eval-auc:0.86394
[22] train-auc:0.85400 eval-auc:0.85840
[23] train-auc:0.85060 eval-auc:0.85944
[24] train-auc:0.85049 eval-auc:0.85944
[25] train-auc:0.85580 eval-auc:0.85840
Accuracy: 85.33%
ROC AUC: 0.86
precision recall f1-score support
0 0.77 0.62 0.69 39
1 0.87 0.94 0.90 111
accuracy 0.85 150
macro avg 0.82 0.78 0.80 150
weighted avg 0.85 0.85 0.85 150
Epoch 1/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7420 - loss: 43462084.0000
Epoch 2/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7238 - loss: 2059577.3750
Epoch 3/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.4966 - loss: 1675796.1250
Epoch 4/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.6978 - loss: 3899460.2500
Epoch 5/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6036 - loss: 712032.1875
Epoch 6/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.6346 - loss: 1453109.6250
Epoch 7/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.7205 - loss: 2453228.2500
Epoch 8/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5295 - loss: 637491.3125
Epoch 9/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6576 - loss: 2188584.0000
Epoch 10/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6201 - loss: 3108840.0000
Test Accuracy: 0.7400000095367432
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Classification Report for Deep Learning Model:
precision recall f1-score support
0 0.00 0.00 0.00 39
1 0.74 1.00 0.85 111
accuracy 0.74 150
macro avg 0.37 0.50 0.43 150
weighted avg 0.55 0.74 0.63 150
<Sequential name=sequential, built=True>
Label¶
# Split data
X_train_Le, X_test_Le, y_train_Le, y_test_Le = PreprocessingTrainTestSplit.split_data(result_df_Le, "corrisponde_ad_attacco")
# Initial model training and evaluation
InitialTraining.train_and_evaluate_initial_models(X_train_Le, y_train_Le, X_test_Le, y_test_Le)
# Hyperparameter tuning
best_models_Le = HyperparameterTuning.tune_hyperparameters(X_train_Le, y_train_Le)
# Evaluate best models on test set
evaluator_Le = ModelEvaluator(best_models_Le)
evaluation_results_Le = evaluator_Le.evaluate_models(X_test_Le, y_test_Le)
# Train XGBoost model
AdvancedModels.train_xgboost(X_train_Le, y_train_Le, X_test_Le, y_test_Le)
# Train deep learning model
DeepLearningModel.train_deep_learning_model(X_train_Le, y_train_Le, X_test_Le, y_test_Le)
Decision Tree Classification Report:
precision recall f1-score support
0 0.86 0.79 0.83 39
1 0.93 0.95 0.94 111
accuracy 0.91 150
macro avg 0.90 0.87 0.88 150
weighted avg 0.91 0.91 0.91 150
AdaBoost Classification Report:
precision recall f1-score support
0 0.84 0.54 0.66 39
1 0.86 0.96 0.91 111
accuracy 0.85 150
macro avg 0.85 0.75 0.78 150
weighted avg 0.85 0.85 0.84 150
XGBoost Classification Report:
precision recall f1-score support
0 0.84 0.69 0.76 39
1 0.90 0.95 0.93 111
accuracy 0.89 150
macro avg 0.87 0.82 0.84 150
weighted avg 0.88 0.89 0.88 150
CatBoost Classification Report:
precision recall f1-score support
0 0.87 0.69 0.77 39
1 0.90 0.96 0.93 111
accuracy 0.89 150
macro avg 0.89 0.83 0.85 150
weighted avg 0.89 0.89 0.89 150
MLP Classification Report:
precision recall f1-score support
0 0.00 0.00 0.00 39
1 0.74 1.00 0.85 111
accuracy 0.74 150
macro avg 0.37 0.50 0.43 150
weighted avg 0.55 0.74 0.63 150
Quadratic Discriminant Analysis Classification Report:
precision recall f1-score support
0 0.64 0.69 0.67 39
1 0.89 0.86 0.88 111
accuracy 0.82 150
macro avg 0.77 0.78 0.77 150
weighted avg 0.82 0.82 0.82 150
Extra Trees Classification Report:
precision recall f1-score support
0 0.84 0.69 0.76 39
1 0.90 0.95 0.93 111
accuracy 0.89 150
macro avg 0.87 0.82 0.84 150
weighted avg 0.88 0.89 0.88 150
Best parameters for Random Forest: {'max_depth': 20, 'min_samples_split': 2, 'n_estimators': 100}
Best F1-score: 0.9480167025010606
Best parameters for Gradient Boosting: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100}
Best F1-score: 0.9394801191834763
Best parameters for Naive Bayes: {}
Best F1-score: 0.7503471276198549
Best parameters for KNN: {'knn__metric': 'euclidean', 'knn__n_neighbors': 5, 'knn__weights': 'distance'}
Best F1-score: 0.9781855877828152
Best parameters for Logistic Regression: {'logreg__C': 0.1, 'logreg__solver': 'lbfgs'}
Best F1-score: 0.9124706106218712
Random Forest Classification Report:
precision recall f1-score support
0 0.84 0.69 0.76 39
1 0.90 0.95 0.93 111
accuracy 0.89 150
macro avg 0.87 0.82 0.84 150
weighted avg 0.88 0.89 0.88 150
Gradient Boosting Classification Report:
precision recall f1-score support
0 0.88 0.77 0.82 39
1 0.92 0.96 0.94 111
accuracy 0.91 150
macro avg 0.90 0.87 0.88 150
weighted avg 0.91 0.91 0.91 150
Naive Bayes Classification Report:
precision recall f1-score support
0 0.44 0.72 0.54 39
1 0.87 0.68 0.76 111
accuracy 0.69 150
macro avg 0.65 0.70 0.65 150
weighted avg 0.76 0.69 0.70 150
KNN Classification Report:
precision recall f1-score support
0 0.95 0.90 0.92 39
1 0.96 0.98 0.97 111
accuracy 0.96 150
macro avg 0.96 0.94 0.95 150
weighted avg 0.96 0.96 0.96 150
Logistic Regression Classification Report:
precision recall f1-score support
0 0.80 0.51 0.62 39
1 0.85 0.95 0.90 111
accuracy 0.84 150
macro avg 0.82 0.73 0.76 150
weighted avg 0.84 0.84 0.83 150
[0] train-auc:0.82353 eval-auc:0.82190
[1] train-auc:0.87323 eval-auc:0.79603
[2] train-auc:0.87468 eval-auc:0.79441
[3] train-auc:0.86802 eval-auc:0.78286
[4] train-auc:0.88090 eval-auc:0.82121
[5] train-auc:0.88220 eval-auc:0.84211
[6] train-auc:0.88166 eval-auc:0.84176
[7] train-auc:0.88310 eval-auc:0.85112
[8] train-auc:0.88271 eval-auc:0.84442
[9] train-auc:0.88364 eval-auc:0.85262
[10] train-auc:0.88163 eval-auc:0.85043
[11] train-auc:0.87650 eval-auc:0.84142
[12] train-auc:0.88711 eval-auc:0.85690
[13] train-auc:0.88780 eval-auc:0.85690
[14] train-auc:0.88791 eval-auc:0.85690
[15] train-auc:0.88821 eval-auc:0.86175
[16] train-auc:0.88821 eval-auc:0.86175
[17] train-auc:0.88807 eval-auc:0.86175
[18] train-auc:0.88810 eval-auc:0.85759
[19] train-auc:0.88900 eval-auc:0.86221
[20] train-auc:0.88889 eval-auc:0.86198
[21] train-auc:0.88903 eval-auc:0.86198
[22] train-auc:0.88952 eval-auc:0.85874
[23] train-auc:0.88908 eval-auc:0.86221
[24] train-auc:0.88873 eval-auc:0.86221
[25] train-auc:0.88874 eval-auc:0.86175
[26] train-auc:0.89051 eval-auc:0.85251
[27] train-auc:0.89013 eval-auc:0.85228
[28] train-auc:0.88966 eval-auc:0.85505
Accuracy: 84.00%
ROC AUC: 0.85
precision recall f1-score support
0 0.69 0.69 0.69 39
1 0.89 0.89 0.89 111
accuracy 0.84 150
macro avg 0.79 0.79 0.79 150
weighted avg 0.84 0.84 0.84 150
Epoch 1/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7210 - loss: 35000396.0000
Epoch 2/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6672 - loss: 5387963.0000
Epoch 3/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5509 - loss: 1828111.8750
Epoch 4/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6714 - loss: 2025396.5000
Epoch 5/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6830 - loss: 3374135.5000
Epoch 6/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6411 - loss: 1886459.8750
Epoch 7/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6435 - loss: 2256157.0000
Epoch 8/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5381 - loss: 1076551.0000
Epoch 9/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5985 - loss: 1026674.2500
Epoch 10/10
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.7046 - loss: 2105922.5000
Test Accuracy: 0.7400000095367432
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
Classification Report for Deep Learning Model:
precision recall f1-score support
0 0.00 0.00 0.00 39
1 0.74 1.00 0.85 111
accuracy 0.74 150
macro avg 0.37 0.50 0.43 150
weighted avg 0.55 0.74 0.63 150
<Sequential name=sequential_1, built=True>
evaluator_OH.print_best_model('OneHot Encoder')
evaluator_Le.print_best_model('Label Encoder')
Dopo la codifica con OneHot Encoder il modello migliore è stato KNN con lo score di 0.9134 Dopo la codifica con Label Encoder il modello migliore è stato KNN con lo score di 0.9471